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Abstract. Security protocols often use randomization to achieve probabilistic non-determinism. This 
non-determinism, in turn, is used in obfuscating the dependence of observable values on secret data. 
Since the correctness of security protocols is very important, formal analysis of security protocols 
has been widely studied in literature. Randomized security protocols have also been analyzed using 
formal techniques such as process-calculi and probabilistic model checking. In this paper, we consider 
the problem of validating implementations of randomized protocols. Unlike previous approaches which 
treat the protocol as a white-box, our approach tries to verify an implementation provided as a black 
box. Our goal is to infer the secrecy guarantees provided by a security protocol through statistical 
techniques. We learn the probabilistic dependency of the observable outputs on secret inputs using 
Bayesian network. This is then used to approximate the leakage of secret. In order to evaluate the 
accuracy of our statistical approach, we compare our technique with the probabilistic model checking 
technique on two examples: crowds protocol and dining crypotgrapher's protocol. 



1 Introduction 



A number of randomized protocols have been proposed to ensure the secrecy of certain facts which 
must not be disclosed while some consequences of these facts have to be made observable. There could 
be several motivations for this secrecy such as individual privacy expectations or negotiations among 
mutually untrusted parties. For example, a voting machine would be expected to retain the anonymity 
of the voter while recording his vote and hence, the vote recording protocol would do some periodic 
random reordering of votes. Other more involved examples of randomized protocols are contract-signing 
protocol, privacy-preserving auction protocol and crowds protocol for routing messages. These protocols 
achieve this hiding of secret information by using randomization to obfuscate the relation between the 
secret data and the observable data. The goal of these protocols is to make it difficult for an attacker 
to learn the secret from the observable data. 

Since the correctness of security protocols is very important, formal verification of security protocols has 
been widely studied in literature. Deterministic protocols can be modeled as labeled transition systems 
and formal techniques such as model checking can be used for state exploration to verify properties 
expressed in temporal logic. Randomized security protocols can be modeled as discrete time Markov 
chains (DTMCs) or Markov decision processes (MDPs) and probabilistic model checking techniques can 
be used to verify properties expressed in stochastic temporal logic which is temporal logic augmented 
with probabilities. 

A key issue with these randomized protocols is that their implementations can be imperfect. For ex- 
ample, in the Dining Cryptographers protocol |TU], the coins being used by the cryptographers might 
be biased which might reveal probabilistic information about which cryptographer paid. Another ex- 
ample is that in the Crowds Protocol [TH], the crowd might be infested by moles which provide their 
observation to the adversary. This might be used by an adversary to guess the sender of a message with 
greater accuracy. 

The goal of this project is to develop a statistical technique for analyzing the protocol implementa- 
tions and quantifying the anonymity loss. While trying to validate the implementation of the security 
protocols, we make the following assumptions. 



1. The implementation is not created in an hostile environment and any implementation error is only 
an unintended bug such as use of poor pseudo-random generators. If the implementation is hostile, 
it can contain bugs which can not be easily detected by random sampling. For example: a crowds 
protocol implementation which would fail to anonymize for a particular path would not necessarily 
be detected as erroneous by our technique. 

2. The implementation might have other vulnerabilities which make it possible to compromise it. Our 
analysis is limited to secrecy guarantees provided by the implementation and not to whether it is 
vulnerable to attacks. 

3. The quality of the source of randomization while testing is the same as when the implementation is 
deployed. If the source of randomization deteriorates and becomes more deterministic, the secrecy 
guarantees checked during testing will no longer hold true. 

We assume that the implementation of the protocol is provided to us as a black-box. The reason we 
consider the implementation as a black box is because unlike protocols which are public, implementa- 
tions could be an executable binary or a hardware implementation or an IP core and we would be able 
to check the implementation's correctness only by observing its inputs and outputs. 
Our approach to learn probabilistic dependencies is inspired from the work in reverse engineering 
genetic networks [17114121] where a similar problem exists. Molecular and cellular processes form com- 
plex stochastic feedback networks where the regulatory molecules that control expression of genes are 
themselves controlled by other genes. Hence, an important problem is to reconstruct functional net- 
work architectures from the observed time series of gene expression data. This requires discovering 
probabilistic dependencies among the different genes. In verifying an implementation of a randomized 
protocol, we can obtain a number of sample traces (over both secret and observed values) of its working, 
and then use that to construct dependency of observed values from secret values. 

Once the probabilistic dependency of the observable values on the secret data has been learnt statisti- 
cally from the traces of the randomized protocol implementation, the task of measuring the information 
leaked by the security protocol is similar to the channel capacity estimation problem. We compare the 
estimates obtained by our technique with that from probabilistic model checking to validate its accu- 
racy. We use the two example^ used in [5] - dining cryptographers and crowds for comparison. 
The novel contributions made in this paper are - 

1. This is a first completely statistical approach to verifying probabilistic security protocols and does 
not require any modeling of protocol as DTMC or MDP. This technique requires only sampling 
over the traces obtained by running the protocol implementation. We are not aware of any existing 
black-box testing approach for verifying security protocols. 

2. We experimentally compare our technique with probabilistic model checking. 

The rest of the paper is organized as follows. In Section [2j we discuss related works. We present our 
statistical approach in Section [S] The experimental analysis of our approach is presented in Section [l] 
We mention some limitations of our work in Section [S] and conclude by identifying some future work 
in Section [6] 



2 Related Works 

Characterization of secrecy loss in cryptographic protocols as channel capacity and mutual information 
has also been well studied in literature. Several efforts have also been made towards validation of 
randomized security protocols. In this section, we briefly summarize some of the relevant work. 
Secrecy provided by protocols is measured in terms of the entropy of the observable data in [20]. This 
work uses the lack of information with the attacker as the notion of anonymity or secrecy. Thus, it 
implicitly assumes uniform distribution of the secret data. 

An alternative approach [15] is to consider the mutual information between the observed data and the 
secret data. This is referred as the channel capacity where the secrecy leak is modeled as the covert 
channel due to the imperfect nature of the security protocol. Consequently, this measure depends only 
on the protocol and not on the distribution of the secrets. Maximizing the channel capacity with respect 
to input distribution can be used as a measure of worst case secrecy loss. 

^ models used for probabilistic model checking in [5] are available online at |http: / /www.prismmodelchecker.org, 
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Different notions of quantitative information flow 1718) have been developed in context of analyzing 
information flow in a program. Many of these techniques assume some distribution (mostly uniform) 
on the secret input space and only provide guarantees for such input spaces. Clarkson, Myers and 
Schnieder define information leaked as the difference between a prior guess of the attacker and the 
posterior belief of the attacker after the observation has been made. For example, an attacker could 
have some probabilistic model for the relation between secret F and observable data G as well as some 
guess about the secret facts F. After observing G, he can revise his guess. The incremental gain in 
knowledge of the attacker is deflned to be the information flow from secret to observable data. The 
metric used to measure the difference in probability distribution in this work is the KuUback-Leibler 
distance. 

Another approach based on hypothesis testing is described by Di Pierro et al [IS]. Here attacks are 
seen to be like experiments conducted by an attacker to validate his hypothesis about the secret or to 
revise them. A similar approach by Charzikokolakis et al J4j considers the problem of calculating errors 
of randomized protocols as finding how well can an attacker estimate the maximum a posteriori rule 
given that a priori distribution is not known. They characterize this as Bayes risk. 
Two existing approaches for verifying randomizing protocols that are most similar to our work are 
probabilistic variant of pi-calculus |9l3j and probabilistic model checking pi. We use probabilistic model 
checking as comparison point in our experimental evaluation. Probabilistic model checking [13] is an 
extension of model checking [T] which was initially developed as a formal technique for finding bugs 
in circuits. It is used to model and validate systems which exhibit stochastic behaviour. Generally, 
probabilistic systems are specified as DTMCs or MDPs and the conditional probability of an observable 
value given a secret data is computed as the probability of reaching some state in the formal model. 
A major problem with application of this technique to verify implementations of randomized protocol 
is that one needs to abstract the implementation into a formal model. Even if the implementation is 
available as a source code or hardware design, it is non-trivial to derive the formal model from it. 
Errors might be overlooked or new errors might be introduced in derivation of the formal model from 
the implementation. In case, the implementation is only available as a black box, this technique can 
not be used. 

3 Statistical Approach 

In this section, we present our statistical approach to analysis of randomized protocols. We start with 
some preliminaries on information theory which is used in the rest of the section. We also briefly 
summarize the existing work on measuring loss of secrecy as channel capacity or mutual information 
to formally define our problem. We then show how we can use traces of the protocol implementation 
to learn probabilistic dependency graph between the secrets and the observable data. This dependency 
graph is basically a Bayesian network. We describe how we can learn the structure and parameters of 
the network from the traces and then, show how to estimate the mutual information from the learnt 
parameters. 



In our approach, the randomized protocol to be analyzed can be viewed to be an information channel 
similar to Figure [T] It takes two inputs - the secret data S and a set of one or more random numbers 
TZ. These are processed to output an observable event O. The goal of the randomized protocol is to 
ensure that it is difficult to infer the value of secret data from the observable events. Thus, our task is 
to characterize how much information about <S is leaked through O. 




Randomized Protocol 



Secret 



Observable 



Fig. 1. Randomized Protocol Input /Output 
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3.1 Information Theory 



Analysis of probabilistic systems relies on a number of tools and concepts used in information theory 
to reason about the uncertainty of some data and the amount of information it can reveal about some 
other data which was used in its computation. 

An important notion borrowed from information theory is that of entropy. If X is a random variable, 
then H{X) denotes its entropy and is defined as H{X) — — '^^^^ P^^^^'^d P(^) where X denotes the 
domain of X. Entropy measures the uncertainty of the random variable. If log is taken with base 2, 
entropy is also the measure of number of bits required to express X. Intuitively, if k bits are used 
to encode a variable, then it can take 2*^ possible values assuming an uniform distribution. Thus, the 
entropy is measure of the information content in X. 

Another important notion is that of relative entropy between two probability distributions p and q. It 
is defined as the KuUback-Leibler distance D{p\\q) — '^^^^ p{x)log p{x)/q{x). This distance is always 
positive and is if p — q. Intuitively, it is measure of the expected difference in the number of bits 
required to code samples from p when using a code based on p, and when using a code based on q. In 
coding theory, KL divergence is interpreted as the expected extra message-length per datum that must 
be communicated if a code that is optimal for a wrong distribution q is used, compared to using a code 
based on the true distribution p. 

A related concept is that of conditional entropy of two random variables X and Y . The conditional 
entropy H{X\Y) is H{X\Y) = — X/ygj; Piv) '^xex P{^\y)^'^9 Pi^lv) measures the uncertainty of X after 
Y is observed. It summarizes the extra information in X which is not inferred from y. It is maximum 
when H(X\Y) = H{X), in which case the uncertainty of X remains unchanged on observing Y. It is 
minimum at if there is deterministic function f{y) — x. 

The change in the uncertainty of X on observing Y is provided by the difference between entropy of 
X and the conditional entropy of X and Y . This quantity is called the mutual information which is 
defined as I{X;Y) = H{X) - H{X\Y). A little algebra shows that I{X;Y) = iIy;X). This is the 
main quantity of interest to us. 

3.2 Channel Capacity 

Randomized protocols can be viewed as a lossy communication channel fSl from S to O and can be rep- 
resented as a tuple (<S, 0,p{-\-)) where p is the conditional probability distribution of observation given 
the secret. The mutual information between S and the observation O defines the flow of information 
across this channel. This definition of information fiow does not enforce any probability distribution 
over the secrets 5. The maximum capacity is defined to be the maximum flow possible across this 
channel. Thus, channel capacity of the randomized protocol is maxp(s) I{S; O), that is, 

maXp(s)I{S\ O) = maXp(^){H{S) - H{S\0)} = max^^^) ^ '^[p{s)p{o\s)log p{s\o)/p{s)] 

s o 

where s and o are different values taken by the secret and observable variables. 

The above expression which is maximized over p{s) is over two parameter distributions p(o|s) and 
p{s\o). 

We note that algebraic manipulation would not decrease the degree of freedom of this expression and 
it would always be over two unknown parameters. For example: if we use p(sjo) = p{o\s)p[s) / p(o) to 
rewrite the above expression as 

maXp(s)l{S;0) = m,aXp(^s){H{S) - H{S\0)} = maxp^^) '^'^[p{s)p{o\s)log p(o\s)/p{o)] 

s o 

The above expression is still over two parameters p{s\o) and p(o). 

Our goal is to estimate the channel capacity of the randomized protocol which would characterize 
the maximum secrecy loss. For this we will need to estimate the above two distributions for a given 
randomized protocol. 

We choose to estimate p{s\o) and p{o\s) since 
— estimating p{o) without the knowledge of p(s) (except that we know that this distribution p{s) 
would maximize the mutual information) would be difficult. 
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— the above expression can be easily shown to be convex in p{s\o) and p{s) and hence maximizing the 
expression using these as parameters can be easily accomplished using gradient descent technique. 
For cases, where the distribution p{s) which maximizes mutual information is known, the problem 
would be limited to only finding p{o\s) since the joint probability distribution can be derived and the 
marginals and conditionals computed thereof. In general, it is not possible to analytically find p{s) at 
which the maximum would be attained. 



3.3 Learning P{0\S) 

Our approach towards learning the conditional distribution p{o\s) is to construct a Bayesian network 
with nodes S U O and edges from <S to C A Bayesian network is a probabilistic graphical model 
that represents the probabilistic dependencies over a set of random variables. It is a graphical way to 
represent the factorization of the join probability distribution such that for each node, only a conditional 
probability table is maintained which gives the conditional probability of the random variable at that 
node given the values of the random variables at the parents' nodes. In this particular instance, for 
each observable variable, it depends on one or more secret variables which would be the parent of the 
observable variable. Efficient algorithms exist that perform inference and learning in Bayesian networks. 
Since the protocol is only available to us as a black box implementation, we need to learn the edge 
connectivity of the corresponding Bayesian network to find what secrets determine what observable 
variables. This problem of learning structure of Bayesian networks from traces has been previously 
studied in literature in context of learning genetic networks [14l21j . We adapt their approach to our 
problem and summarize it here. A more detailed discussion of this technique is available in [14]. One 
domain knowledge that we exploit is that the secret variables and the observable variables do not have 
any intra-dependencies, that is, the observable variables are conditionally independent of each other 
given the secret variables. This reduces the search space of possible structures of the Bayesian network. 
The structure learning algorithm works by using the mutual information measures between the ob- 
servable variables and the secret variables. The algorithm begins by identifying observable variables 
which depend on only one secret variable. If the mutual information A4(Oi, Sj) is same as H{Oi), then 
clearly the observable variable Oi depends only on Sj and there is a single edge ending at Oi - the one 
from Sj to Oi. After identifying all nodes which have single edge, the algorithm proceeds to identifying 
observable variable nodes which depend on two secret nodes. The approach is similar - if the mutual 
information content M{Oi, [Sj, Sk]) is same as the entropy H{Oi), then there are two edges ending at 
Oi starting from Sj and Sk- The process is continued till the maximum bound on the number of edges 
received as input. The algorithm is described below in Algorithm [T] Conservatively max-degree bound 
can be specified to be the larger of the sizes of the secrets and observables. 

Input: Max-degree d, Secrets <S, Observables O and a set of traces T over these variables 
Output: Mapping of observable to secrets 
Or = 0; 

foreach observable Oj do 

I Calculate i/(Oj) from T; 
end 

foreach k in 1 to d do 

foreach observable Oj £ Or do 

foreach k-size candidate subset Sc of S do 

Calculate M{Sc, Oj) from T; if M{Sc, Oj) = H{Oj) then 

I Remove Oj from Or; Record edge from all Si e Sc to Oj; 
end 

end 

end 

Increment k; 

end 

assert (Or is empty); 

Algorithm 1: Bayesian Network Structure Learning Algorithm 
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Once the Bayesian structure has been learnt, the conditional probability values for each observable 
variable is learnt using maximum likelihood estimates (MLE) [12111] , Let 6 denote the unknown pa- 
rameter of the conditional probability distribution, then the maximum likelihood estimate of 6 would 
be given by argmaxgP{T\6). An unbiased estimator for MLE is the frequency count. 

3.4 Estimating P{S\0) for mutual information 

The second parameter that we need to learn in order to compute the mutual information is P{S\0). We 
present two approaches to compute this probability distribution. First, we consider a special case where 
the conditional probability distribution is symmetric and then, we show a more general approach to 
this problem. The special case was used in probabilistic model checking approach [5]. The more general 
case is used in coding theory to infer correct codes from wrong codes. 

Exploiting row symmetry of P{0\S) We consider the particular case in which the rows 

of P{0\S) are permutations of each other. In this particular case, 

1(5; O) = H{0)H{0\S) where H{0\S) = Y.,Pis)H{0\S = s). 

By row symmetry H{0\S = s) = H{0\S = s') = H{0^) for aU s, s'. 

Thus, 1(5; O) = H{0) - H{Os) 

So, in this special case, P{S\0) need not be explicitly calculated and mutual information can be directly 
computed using the symmetry. 

Inference using gradient descent In general case, we need to use gradient descent techniques 
to compute the channel capacity by maximizing the mutual information over p(5). An example of such a 
technique is Arimoto-Blahut (AB) algorithm. A detailed discussion of this algorithm and its extensions 
is presented in [19]. It is essentially a Bayesian approach where parameter is itself treated as a random 
variable. The unknown distribution parameters for the conditional probability p(S\0) and p(S) are 
treated as two different random variables (co-ordinates) over which the maximum needs to be attained. 
AB algorithm is an iterative technique with the following update rules - 

- p^+\s\o)^p\s)p{o\s)/Y.^p\s)p{o\s) 

- p'*\s) = {Y{y+\s\oY^''^'y)/[Y^^{Y{y+\s\or^°^n 

Since the mutual information expression is convex in both the parameters, the above iterative algorithm 
would eventually terminate with the correct answer that would correspond to maximum over possible 
pis) and hence, is the channel capacity of the randomized protocol. 

4 Experiments and Results 

We validated our statistical technique by comparing it with probabilistic model checking on two ex- 
amples - dining crypographer's protocol and crowds protocol. These particular examples were chosen 
because their PRISM tl3j models (MDPs) are available from the PRISM tool website 0. 

4.1 Dining Cryptographer 

Some k number of cryptographers are dining . Either one of them or the host will pay the bill. They 
have agreed not to find out who pays the bill in case the bill was paid by one of the cryptographers. 
They only want to find out whether the host paid the bill or one of the cryptographers did. They each 
toss a coin and if their coin matches with the one on their left, they say 1 if they are paying and if they 
are not. If the coins don't match they say if they are paying and 1 if they are not. If no cryptographer 
paid, final exclusive or of their announcements would be (ai ® 02) ® . . . (fli ® fli+i) . . . {a^ © ai) zero. 
If a cryptographer paid: final exclusive or will yield 1. Thus, this protocol provides a channel to allow 
announcement from a sender while maintaining his anonymity. But this secrecy guarantee relies on the 

^ http:/ /www. prismmodelchecker.org/ 
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fairness of the coin used by the cryptographers. An attacker can bias the coins used by cryptographers 
and then try to learn which cryptographer paid. 

We conducted 3 different set of experiments with k = 3, 4, 5 for the above protocoL In each set of 
experiment, we considered the coins with heads probabihty varying from 0.0 to 1.0 in increments of 
0.1. We estimated the channel capacity for the protocol using probabilistic model checking and our 
technique. The corresponding plots are presented in Figure [21 The plot shows the channel capacity 
estimated using probabilistic model checking as a continuous line while the statistically inferred data 
is shown as points. The plot shows that for this case study, the statistical estimates are very close to 
real values of channel capacity. 




Dining Cryptographers Protocol - 5 



Actual Channel Capacity 
Estimated Channel Capacity 



I ' ^ * ''^ ' 1 

0.2 0.4 0.6 0.8 1 

Coin Bias (0.5 = fair) 

(c) k=5 



Fig. 2. Dining Cryptographers with varying number of cryptographers (k) 



The channel capacity is observed to be minimum, that is 0, when fair coin is used by the cryptographers. 
This corresponds to the strong anonymity guarantees of the protocol if it is implemented using a fair 
coin. As the coin is made unfair, channel capacity rises and attains its maximum value which is close 
to log of the number of cryptographers when the coin is completely biased. 

4.2 Crowds Protocol 

Crowds: This is meant to ensure that a message is transmitted from source to destination without 
revealing the identity of the source. Initiator randomly selects a node to send it message. Each forward- 
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ing node, with some fixed probabiUty deUvers message to destination or sends to some other randomly 
selected node. Let some nodes be dishonest. They do not forward their messages and record the node 
which came before them in the path. As paths are rebuilt, the corrupt nodes will very likely see the 
sender node as predecessor more often than other nodes. Plots in Figure [3] and Figure |4] assume that 
the attacker is on the path from original source to destination. 



Crowds Protocol 




Probability of resetting a route 

(a) 1 Corrupt node 



Crowds Protocol 




Forwarding Probability 



(b) 2 Corrupt nodes 




Fig. 3. Crowds Protocol with 10 iionest nodes 



We consider two different sets of experiments - the first set with 10 honest nodes and varying number of 
dishonest nodes (1,2,5,10) and the second set with 100 honest nodes and varying number of dishonest 
nodes (10,20,50 and 100). We plot the forwarding probability with the mutual information. While our 
statistical estimates are close to the values computed by probabilistic model checking for the smaller 
crowds, the variation increases for larger crowds illustrating that sampling was not uniform. 
The plots show that the maximum information is obtained when the forwarding probability is minimum. 
The maximum information correspond to base 2 logarithm of the total number of honest nodes in the 
crowd, that is, all the identities are revealed. 



5 Limitations 



We identify some key weaknesses of our approach. 
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Crowds Protocol 



Crowds Protocol 




ForTwarding Probability 

(a) 10 Corrupt node 



ForTwarding Probability 

(b) 20 Corrupt nodes 



Crowds Protocol 



Crowds Protocol 




Forwarding Probability 

(c) 50 Corrupt nodes 




Forwarding Probability 

(d) 100 Corrupt nodes 



Fig. 4. Crowds Protocol with 100 honest nodes 
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1. We need to evaluate our technique on non-symmetric protocols. Currently AB algorithm is not 
implemented and hence, we can only work with symmetric protocols. 

2. We assume that the source of randomization is of the same quality during testing and deployment. 
Most errors in implementations of security protocol arise due to poor randomization in deployment. 

3. In cases, where wo can identify that anonymity guarantees provided by an implementation is not 
sufficient, wc do not provide any information regarding what was wrong with the implementation. 
In contrast, probabilistic model checking can localize errors in a protocol. 

4. Unlike formal techniques which provide guarantees about their correctness, our statistical approach 
lacks any guarantee. It relies heavily on randomized sampling of the traces of the implementation. 

6 Conclusion and Future Work 

The statistical approach to analyze security protocols seems promising. We plan to compare our tech- 
nique more extensively with probabilistic model checking as well as other techniques to analyze ran- 
domized protocols. We identify some directions in which further work can be done on this project 

1. Implementations can be partially observable and hence, we can use more than just the secret and 
observable variables in our traces for more involved protocols. 

2. This technique is not limited to security protocols and can also be extended to quantify the infor- 
mation flow in a program. 

3. This analysis technique can be used to identify protocol parameters for some intended degree of 
secrecy. For example: how many nodes in a crowd should be corrupt such that the protocol is 
exactly k-anonymous. 
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